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3D Television System and Method 
Field of the Invention 

[01] This invention relates generally to image processing, and more 
particularly to acquiring, transmitting, and rendering auto-stereoscopic 
images. 

Background of the Invention 

[02] The human visual system gains three-dimensional information in a 
scene from a variety of cues. Two of the most important cues are binocular 
parallax and motion parallax. Binocular parallax refers to seeing a different 
image of the scene with each eye, whereas motion parallax refers to seeing 
different images of the scene when the head is moving. The link between 
parallax and depth perception was shown with the world's first three- 
dimensional display device in 1838. 

[03] Since then, a number of stereoscopic image displays have been 
developed. Three-dimensional displays hold a tremendous potential for 
many applications in entertainment, advertising, information presentation, 
tele-presence, scientific visualization, remote manipulation, and art. 

[04] In 1908, Gabriel Lippmann, who made major contributions to color 
photography and three-dimensional displays, contemplated producing a 
display that provides a "window view upon reality." 
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[05] Stephen Benton, one of the pioneers of holographic imaging, refined 
Lippmann's vision in the 1970s. He set out to design a scalable spatial 
display system with television-like characteristics, capable of delivering full 
color, 3D images with proper occlusion relationships. That display provided 
images with binocular parallax, i.e., stereoscopic images, which can be 
viewed from any viewpoint without special lenses. Such displays are called 
multi-view auto-stereoscopic because they naturally provide binocular and 
motion parallax for multiple viewers. 

[06] A variety of commercial auto-stereoscopic displays are known. Most 
prior systems display binocular or stereo images, although some recently 
introduced systems show up to twenty-four views. However, the 
simultaneous display of multiple perspective views inherently requires a 
very high resolution of the imaging medium. For example, maximum HDTV 
output resolution with sixteen distinct horizontal views requires 
1920 x 1080 x 16 or more than 33 million pixels per output image, which is 
well beyond most current display technologies. 

[07] It has only recently become feasible to deal with the processing and 
bandwidth requirements for real-time acquisition, transmission, and display 
of such high-resolution content. 

[08] Today, many digital television channels are being transmitted using 
the same bandwidth previously occupied by a single analog channel. This 
has renewed interest in the development of broadcast 3D TV. The Japanese 
3D Consortium and the European ATTEST project have each set out to 
develop and promote I/O devices and distribution mechanisms for 3D TV. 
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The goal of both groups is to develop a commercially feasible 3D TV 
standard that is compatible with broadcast HDTV, and that accommodates 
current and future 3D display technologies. 

[09] However, so far, no fully functional end-to-end 3D TV system has 
been implemented. 

[010] Three-dimensional TV is described in literally thousands of 
publications and patents. Because this work covers various scientific and 
engineering fields, an extensive background is provided. 

[Oil] Lightfield Acquisition 

[012] A lightfield represents radiance as a function of position and direction 
in regions of space that is free of occluders. The invention distinguishes 
between acquisition of lightfields without scene geometry and model-based 
3D video. 

[013] One object of the invention is to acquire a time-varying lightfield 
passing through a 2D optical manifold and emitting the same directional 
lightfield through another 2D optical manifold with minimal delay. 

[014] Early work in image-based graphics and 3D displays has dealt with 
the acquisition of static lightfields. As early as 1929, a photographic multi- 
camera recording method for large objects, in conjunction with the first 
projection-based 3D display, was described. That system uses a one-to-one 
mapping between photographic cameras and slide projectors. 
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[015] It is desired to remove that restriction by generating new virtual views 
in a display unit with the help of image-based rendering. 

[016] Acquisition of dynamic lightfields has only recently become feasible, 
Naemura et al. "Real-time video-based rendering for augmented spatial 
communication ," Visual Communication and Image Processing, SPIE, 620- 
63 1, 1999. They implemented a flexible 4 x4 lightfield camera, and a more 
recent version includes a commercial real-time depth estimation system, 
Naemura et al., "Real-time video-based modeling and rendering of 3d 
scenes," IEEE Computer Graphics and Applications, pp. 66-73, March 
2002. 

[017] Another system uses an array of lenses in front of a special-purpose 
128x128 pixel random-access CMOS sensor, Ooi et al., "Pixel independent 
random access image sensor for real time image-based rendering system," 
IEEE International Conference on Image Processing, vol. II, pp. 193-196, 
2001. The Stanford multi-camera array includes 128 cameras in a 
configurable arrangement, Wilburn et al., "The light field video camera," 
Media Processors 2002, vol. 4674 of SPIE, 2002. There, special-purpose 
hardware synchronizes the cameras and stores the video streams to disk. 

[018] The MIT lightfield camera uses an 8 x8 array of inexpensive imagers 
connected to a cluster of commodity PCs, Yang et al, "A real-time 
distributed light field camera," Proceedings of the 13 th Eurographics 
Workshop on Rendering, Eurographics Association, pp. 77-86, 2002. 
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[019] All those systems provide some form of image-based rendering for 
navigation and manipulation of the dynamic lightfield. 

[020] Model-Based 3D Video 

[021] Another approach to acquire 3D TV content is to use sparsely 
arranged cameras and a model of the scene. Typical scene models range 
from a depth map, to a visual hull, or a detailed model of human body 
shapes. 

[022] In some systems, the video data from the cameras are projected onto 
the model to generate realistic time-varying surface textures. 

[023] One of the largest 3D video studios for virtual reality has over fifty 
cameras arranged in a dome, Kanade et al., "Virtualized reality: 
Constructing virtual worlds from real scenes," IEEE Multimedia, Immersive 
Telepresence, pp. 34-47, January 1997. 

[024] The Blue-C system is one of the few 3D video systems to provide 
real-time capture, transmission, and instantaneous display in a spatially- 
immersive environment, Gross et al., "Blue-C: A spatially immersive 
display and 3d video portal for telepresence," ACM Transactions on 
Graphics, 22, 3, pp. 819-828, 2003. Blue-C uses a centralized processor for 
the compression and transmission of 3D "video fragments." This limits the 
scalability of that system with an increasing number of views. That system 
also acquires a visual hull, which is limited to individual objects, not entire 
indoor or outdoor scenes. 
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[025] The European ATTEST project acquires HDTV color images with a 
depth maps for each frame, Fehn et al., "An evolutionary and optimized 
approach on 3D-TV" Proceedings of International Broadcast Conference, 
pp. 357-365, 2002. 

[026] Some experimental HDTV cameras have already been built, Kawakita 
et al., "High-definition three-dimension camera - HDTV version of an axi- 
vision camera," Tech. Rep. 479, Japan Broadcasting Corp. (NHK), Aug. 
2002. The depth maps can be transmitted as an enhancement layer to 
existing MPEG-2 video streams. The 2D content can be converted using 
depth-reconstruction processes. On the receiver side, stereo-pair or multi- 
view 3D images are generated using image-based rendering. 

[027] However, even with accurate depth maps, it is difficult to render 
multiple high-quality views on the display side because of occlusions or high 
disparity in the scene. Moreover, a single video stream cannot capture 
important view-dependent effects, such as specular highlights. 

[028] Real-time acquisition of depth or geometry for real-world scenes 
remains very difficult. 

[029] Lightfield Compression and Transmission 

[030] Compression and streaming of static lightfields is also known. 
However, very little attention has been paid to the compression and 
transmission of dynamic lightfields. One can distinguish between all- 
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viewpoint encoding, where all of the lightfield data is available at the display 
device, and finite-viewpoint encoding. Finite-viewpoint encoding only 
transmits data that are needed for a particular view by sending information 
from the user back to the cameras. This leads to a reduced transmission 
bandwidth, but that encoding is not amenable for 3D TV broadcasting. 

[031] The MPEG Ad-Hoc Group on 3D Audio and Video has been formed 
to investigate efficient coding strategies for dynamic light- fields and a 
variety of other 3D video scenarios, Smolic et al., "Report on 3dav 
exploration," ISO/IEC JTC1/SC29/WG1 1 Document N5878, July 2003. 

[032] Experimental systems for dynamic lightfield coding use motion 
compensation in the time domain, called temporal encoding, or disparity 
prediction between cameras, called spatial encoding, Tanimoto et al., "Ray- 
space coding using temporal and spatial predictions," ISO/IEC 
JTC1/SC29/WG1 1 Document M10410, December 2003. 

[033] Multi-View Auto-stereoscopic Displays: Holographic Displays 

[034] Holography has been known since the beginning of the century. 
Holographic techniques were first applied to image displays in 1962. In that 
system, light from an illumination source is diffracted by interference fringes 
on a holographic surface to reconstruct the light wavefront of the original 
object. A hologram displays a continuous analog light-field, and real-time 
acquisition and display of holograms has long been considered the "holy 
grail" of 3D TV. 
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[0351 Stephen Benton's Spatial Imaging Group at MIT has been pioneering 
the development of electronic holography. Their most recent device, the 
Mark-II Holographic Video Display, uses acousto-optic modulators, beam 
splitters, moving mirrors, and lenses to create interactive holograms, St.- 
Hillaire et al., "Scaling up the MIT holographic video system," Proceedings 
of the Fifth International Symposium on Display Holography, SPIE, 1995. 

[036] In more recent systems, moving parts have been eliminated by 
replacing the acousto-optic modulators with LCD, focused light arrays, 
optically-addressed spatial modulators, and digital micro-mirror devices. 

[037] All current holographic video devices use single-color laser light. To 
reduce a size of the display screen, they provide only horizontal parallax. 
The display hardware is very large in relation to the size of the image, which 
is typically a few millimeters in each dimension. 

[038] The acquisition of holograms still demands carefully controlled 
physical processes and cannot be done in real-time. At least for the 
foreseeable future it is unlikely that holographic systems will be able to 
acquire, transmit, and display dynamic, natural scenes on large displays. 

[039] Volumetric Displays 

[040] Volumetric displays scan a three-dimensional space, and individually 
address and illuminate voxels. A number of commercial systems for 
applications, such as air-traffic control, medial and scientific visualization, 
are now available. However, volumetric systems produce transparent images 
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that do not provide a fully convincing three-dimensional experience. 
Because of their limited color reproduction and lack of occlusions, 
volumetric displays cannot correctly reproduce the lightfield of a natural 
scene. The design of large-size volumetric displays also poses some difficult 
obstacles. 

[041] Parallax Displays 

[042] Parallax displays emit spatially varying directional light. Much of the 
early 3D display research focused on improvements to Wheatstone's 
stereoscope. F. Ives used a plate with vertical slits as a barrier over an image 
with alternating strips of left-eye/right-eye images, U.S. Patent No. 725,567 
"Parallax stereogram and process for making same," issued to Ives. The 
resulting device is a parallax stereogram. 

[043] To extend the limited viewing angle and restricted viewing position of 
stereograms, narrower slits and smaller pitch can be used between the 
alternating image stripes. These multi-view images are parallax 
panoramagrams. Stereograms and panoramagrams provide only horizontal 
parallax. 

[044] Spherical Lenses 

[045] In 1908, Lippmann described an array of spherical lenses instead of 
slits. Commonly, this is frequently called a "fly's-eye" lens sheet. The 
resulting image is an integral photograph. An integral photograph is a true 
planar lightfield with directionally varying radiance per pixel or 'lenslet'. 
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Integral lens sheets have been used experimentally with high-resolution 
LCDs, Nakajima et al., "Three-dimensional medical imaging display with 
computer-generated integral photography," Computerized Medical Imaging 
and Graphics, 25, 3, pp. 235-241, 2001. The resolution of the imaging 
medium must be very high. For example, an 1024 x768 pixel output with 
four horizontal and four vertical views requires a 12 million pixel per output 
image. 

[046] A 3 x3 projector array uses an experimental high-resolution 3D 
integral video display, Liao et al., "High-resolution integral videography 
auto-stereoscopic display using multi-projector," Proceedings of the Ninth 
International Display Workshop, pp. 1229-1232, 2002. Each projector is 
equipped with a zoom lens to produce a display with 2872 x2150 pixels. The 
display provides three views with horizontal and vertical parallax. Each 
lenslet covers twelve pixels for an output resolution of 240 x 1 80 pixels. 
Special-purpose image-processing hardware is used for geometric image 
warping. 

[047] Lenticular Displays 

[048] Lenticular sheets have been known since the 1930s. A lenticular sheet 
includes a linear array of narrow cylindrical lenses called 'lenticules'. This 
reduces the amount of image data by reducing vertical parallax. Lenticular 
images have found widespread use for advertising, magazine covers, and 
postcards. 
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[049] Today's commercial auto-stereoscopic displays are based on 
variations of parallax barriers, sub-pixel filters, or lenticular sheets placed on 
top of LCD or plasma screens. Parallax barriers generally reduce some of the 
brightness and sharpness of the image. The number of distinct perspective 
views is generally limited. 

[050] For example, a highest resolution LCD provides 3840 x2400 pixels of 
resolution. Adding horizontal parallax with, for example, sixteen views 
reduces the horizontal output resolution to 240 pixels. 

[051] To improve the resolution of a display, H. Ives invented the multi- 
projector lenticular display in 1931 by painting the back of a lenticular sheet 
with diffuse paint and using the sheet as a projection surface for thirty-nine 
slide projectors. Since then, a number of different arrangements of lenticular 
sheets and multi-projector arrays have been described. 

[052] Other techniques in parallax displays include time-multiplexed and 
tracking-based systems. In time-multiplexing, multiple views are projected 
at different time instances using a sliding window or LCD shutter. This 
inherently reduces the frame rate of the display and can lead to noticeable 
flickering. Head-tracking designs focus mostly on the display of high-quality 
stereo image pairs. 

[053] Multi-Projector Displays 

[054] Scalable multi-projector display walls have recently become popular, 
and many systems have been implemented, e.g., Raskar et al., "The office of 
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the future : A unified approach to image-based modeling and spatially 
immersive displays," Proceedings of SIGGRAPH '98, pp. 179-188, 1998. 
Those systems offer very high resolution, flexibility, excellent cost- 
performance, scalability, and large-format images. Graphics rendering for 
multi-projector systems can be efficiently parallelized on clusters of PCs. 

[055] Projectors also provide the necessary flexibility to adapt to non-planar 
display geometries. For large displays, multi-projector systems remain the 
only choice for multi-view 3D displays until very high-resolution display 
media, e.g., organic LEDs, become available. However, manual alignment 
of many projectors becomes tedious, and downright impossible in the case 
of non-planar screens or 3D multi-view displays. 

[056] Some systems use cameras and a feedback loop to automatically 
compute relative projector poses for automatic projector alignment. A digital 
camera mounted on a linear 2-axis stage can also be used to align projectors 
for a multi-projector integral display system. 

Summary of the Invention 

[057] The invention provides a system and method for acquiring and 
transmitting 3D images of dynamic scenes in real time. To manage the high 
demands on computation and bandwidth, the invention uses a distributed, 
scalable architecture. 

[058] The system includes an array of cameras, clusters of network- 
connected processing modules, and a multi-projector 3D display unit with a 
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lenticular screen. The system provides stereoscopic color images for 
multiple viewpoints without special viewing glasses. Instead of designing 
perfect display optics, we use cameras for the automatic adjustment of the 
3D display. 

[059] The system provides real-time end-to-end 3D TV for the very first 
time in the long history of 3D displays. 

Brief Description of the Drawings 

[060] Figure 1 is a block diagram of a 3D TV system according to the 
invention; 

[061] Figure 2 is a block diagram of decoder modules and consumer 
modules according to the invention; 

[062] Figure 3 is a top view of a display unit with rear projection according 
to the invention; 

[063] Figure 4 is a top view of a display unit with front projection according 
to the invention; and 

[064] Figure 5 is a schematic of horizontal shift between viewer-side and 
projection-side lenticular sheets. 
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Detailed Description of the Preferred Embodiment 
[065] System Architecture 

[066] Figure 1 shows a 3D TV system according to our invention. The 
system 100 includes an acquisition stage 101, a transmission stage 102, and 
a display stage 103. 

[067] The acquisition stage 101 includes of an array of synchronized video 
cameras 1 10. Small clusters of cameras are connected to producer modules 
120. The producer modules capture real-time, uncompressed videos and 
encode the videos using standard MPEG coding to produce compressed 
video streams 121. The producer modules also generate viewing parameters. 

[068] The compressed video streams are sent over a transmission network 
130, which could be broadcast, cable, satellite TV, or the Internet. 

[069] In the display stage 103, the individual video streams are 
decompressed by decoder modules 140. The decoder modules are connected 
by a high-speed network 150, e.g., gigabit Ethernet, to a cluster of consumer 
modules 160. The consumer modules render the appropriate views and send 
output images to a 2D, stereo-pair 3D, or multi-view 3D display unit 310. 

[070] A controller 180 broadcasts the virtual view parameters to the decoder 
modules and the consumer modules, see Figure 2. The controller is also 
connected to one or more cameras 190. The cameras are placed in a 
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projection area and/or the viewing area. The cameras provide input 
capabilities for the display unit. 

[071] Distributed processing is used to make the system 100 scalable in the 
number of acquired, transmitted, and displayed views. The system can be 
adapted to other input and output modalities, such as special-purpose 
lightfield cameras, and asymmetric processing. Note that the overall 
architecture of our system does not depend on the particular type of display 
unit. 

[072] System Operation 
[073] Acquisition Stage 

[074] Each camera 1 10 acquires a progressive high-definition video in real- 
time. For example, we use sixteen color cameras with 1310 x 1030, 8 bits per 
pixel CCD sensors. The cameras are connected by an IEEE- 1394 Tire Wire' 
high performance serial bus 1 1 1 to the producer modules 120. 

[075] The maximum transmitted frame rate at mil resolution is, e.g., twelve 
frames per second. Two cameras are connected to each one of eight producer 
modules. All modules in our prototype have 3 GHz Pentium 4 processors, 2 
GB of RAM, and run Windows XP. It should be noted that other processors 
and software can be used. 

[076] Our cameras 110 have an external trigger that allows complete control 
over video synchronization. We use a PCI card with custom programmable 
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logic devices (CPLD) to generate the synchronization signals 1 12 for the 
cameras 1 10. Although it is possible to build camera arrays with software 
synchronization, we prefer precise hardware synchronization for dynamic 
scenes. 

[077] Because our 3D display shows horizontal parallax only, we arranged 
the cameras 1 10 in a regularly spaced linear and horizontal array. In general, 
the cameras 1 10 can be arranged arbitrarily because we are using image- 
based rendering in the consumer modules to synthesize new views, as 
described below. Ideally, the optical axis of each camera is perpendicular to 
a common camera plane, and an 'up vector' of each camera is aligned with 
the vertical axis of the camera. 

[078] In practice, it is impossible to align multiple cameras precisely. We 
use standard calibration procedures to determine the intrinsic, i.e., focal 
length, radial distortion, color calibration, etc., and extrinsic, i.e., rotation 
and translation, camera parameters. The calibration parameters are broadcast 
as part of the video stream as viewing parameters, and the relative 
differences in camera alignment can be handled by rendering corrected 
views in the display stage 103. 

[079] A densely spaced array of cameras provides the best lightfield 
capture, but high-quality reconstruction filters can be used when the 
lightfield is undersampled. 

[080] A large number of cameras can be placed in a TV studio. A subsets of 
cameras can be selected by a user, either a camera operator or a viewer, with 
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a joystick to display a moving 2D/3D window of the scene to provide a free- 
viewpoint video. 

[081] Transmission Stage 

[082] Transmitting sixteen uncompressed video streams with 1310x1030 
resolution and 24 bits per pixel at 30 frames per second requires 14.4 Gb/sec 
bandwidth, which is well beyond current broadcast capabilities. There are 
two basic design choices for compression and transmission of dynamic 
multi-view video data. Either the data from multiple cameras are compressed 
using spatial or spatio-temporal encoding, or each video stream is 
compressed individually using temporal encoding. Temporal encoding also 
uses spatial encoding within each frame, but not between views. 

[083] The first option offers higher compression, because there is a high 
coherence between the views. However, higher compression requires that 
multiple video streams are compressed by a centralized processor. This 
compression-hub architecture is not scalable, because the addition of more 
views eventually overwhelms the internal bandwidth of the encoders. 

[084] Consequently, we use temporal encoding of individual video streams 
on distributed processors. This strategy has other advantages. Existing 
broadband protocols and compression standards do not need to be changed. 
Our system is compatible with the conventional digital TV broadcast 
infrastructure and can co-exist in perfect harmony with 2D TV. 
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[085J Currently, digital broadcast networks carry hundreds of channels and 
perhaps a thousand or more channels with MPEG- 4. This makes it possible 
to dedicate any number of channels, e.g., sixteen, to 3D TV. Note, however, 
that our preferred transmission strategy is broadcasting. 

[086] Other applications, e.g., peer-to-peer 3D video conferencing, can also 
be enabled by our system. Another advantage of using existing 2D coding 
standards is that the decoder modules on the receiver are well established 
and widely available. Alternatively, the decoder modules 140 can be 
incorporated in a digital TV 'set-top' box. The number of decoder modules 
can depend on whether the display is 2D or multi-view 3D. 

[087] Note that our system can adapt to other 3D TV compression 
algorithms, as long as multiple views can be encoded, e.g., into 2D video 
plus depth maps, transmitted, and decoded in the display stage 102. 

[088] Eight producer modules are connected by gigabit Ethernet to eight 
consumer modules 160. Video streams at full camera resolution 
(13 10 x 1030) are encoded with MPEG-2 and immediately decoded by the 
producer modules. This essentially corresponds to a broadband network with 
a very large bandwidth and almost no delay. 

[089] The gigabit Ethernet 150 provides all-to-all connectivity between the 
decoder modules and the consumer modules, which is important for our 
distributed rendering and display implementation. 
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[090] Display Stage 

[091] The display stage 103 generates appropriate images to be displayed on 
the display unit 310. The display unit can be a multi-view 3D unit, a head- 
mounted 2D stereo unit, or a conventional 2D unit. To provide this 
flexibility, the system needs to be able to provide all possible views, i.e., the 
entire lightfield, to the end users at every time instance. 

[092] The controller 180 requests one or more virtual views by specifying 
viewing parameters, such as position, orientation, field-of-view, and focal 
plane, of virtual cameras. The parameters are then used to render the output 
images accordingly. 

[093] Figure 2 shows the decoder modules and consumer modules in greater 
detail. The decoder modules 140 decompress 141 the compressed videos 121 
to uncompressed source frames 142, and stores current decompressed frame 
in virtual video buffers (WB) 162 via the network 150. Each consumer 160 
has a WB storing data of all current decoded frames, i.e., all acquired views 
at a particular time instance. 

[094] The consumer modules 160 generate an output image 164 for the 
output video by processing image pixels from multiple frames in the WBs 
162. Due to bandwidth and processing limitations, it is impossible for each 
consumer module to receive the complete source frames from all the decoder 
modules. This would also limit the scalability of the system. The key 
observation is that the contributions of the source frames to the output image 
of each consumer can be determined in advance. We now focus on the 
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processing for one particular consumer, i.e., one particular virtual view and 
its corresponding output image. 

[095] For each pixel o(u, v) in the output image 164, the controller 180 
determines a view number v and the position (x, y) of each source pixel .s(v, 
x, y) that contributes to the output pixel. Each camera has an associated 
unique view number for this purpose., e.g., 1 to 16. We use unstructured 
lumigraph rendering to generate output images from the incoming video 
streams 121. 

[096] Each output pixel is a linear combination of k source pixels: 

k 

o(h,v) = X w ; s(v,*,v). (1) 

1=0 

[097] Blending weights w t can be predetermined by the controller based on 
the virtual view information. The controller sends the positions {x, y) of the k 
source pixels (s) to each decoder v for pixel selection 143. An index c of a 
requesting consumer module is sent to the decoder for pixel routing 145 
from the decoder modules to the consumer module. 

[098] Optionally, multiple pixels can be buffered in the decoder for pixel 
block compression 144, before the pixels are sent over the network 150. The 
consumer module decompresses 161 the pixel blocks and stores each pixel 
in WB number v at position (x, y). 

[099] Each output pixel requires pixels from k source frames. That means 
that the maximum bandwidth on the network 150 to the WB is k times the 



20 



Pfister et al. 
MERL-1538 



size of the output image times the number of frames per second (fps). For 
example, for k = 3, 30 fps and HDTV output resolution, e.g., 1280 x720 at 
12 bits per pixel, the maximum bandwidth is 1 1 8 MB/sec. This can be 
substantially reduced when the pixel block compression 144 is used, at the 
expense of more processing. To provide scalability, it is important that this 
bandwidth is independent of the total number of transmitted views, which is 
the case in our system. 

[0100] The processing in each consumer module 160 is as follows. The 
consumer module determines equation (1) for each output pixel. The weights 
Wi are predetermined and stored in a lookup table (LUT) 165. The memory 
requirement of the LUT 165 is A; times the size of the output image 164. In 
our example above, this corresponds to 4.3 MB. 

[0101] Assuming lossless pixel block compression, consumer modules 
can easily be implemented in hardware. That means that the decoder 
modules 140, network 150, and consumer modules can be combined on one 
printed circuit board, or manufactured as an application-specific integrated 
circuit (ASIC). 

[0102] We are using the term pixel loosely. It means typically one 
pixel, but it could also be an average of a small, rectangular block of pixels. 
Other known filters can be applied to a block of pixels to produce a single 
output pixel from multiple surrounding input pixels. 

[0103] Combining 163 pre-filtered blocks of the source frames for new 
effects, such as a depth-of-field is novel for image-based rendering. 
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Particularly, we can perform efficiently multi-view rendering of pre-flltered 
images by using summed-area tables. The per-filtered (summed) blocks of 
pixels are then combined using equation (1) to form output pixels. 

[0104] We can also use higher-quality blending, e.g., undersampled 
lightfields. So far, the requested virtual views are static. Note, however, that 
all the source views are sent over the network 150. The controller 180 can 
update dynamically the lookup tables 165 for pixel selection 143, routing 
145, and combining 163. This enables navigation of the lightfield is similar 
to real-time lightfield cameras with random-access image sensors, and frame 
buffers in the receiver. 

[0105] Display Unit 

[0106] As shown in Figure 3, for a rear-projection arrangement, the 
display unit is constructed as a lenticular screen 310. We use sixteen 
projectors to display the output videos on the display unit, with 1024x768 
output resolution. Note that the resolution of the projectors can be less than 
the resolution of our acquired and transmitted video, which is 1310x1030 
pixels. 

[0107] The two key parameters of lenticular sheets 3 1 0 are the field-of- 
view (FOV) and the number of lenticules per inch (LPI), also see Figures 4 
and 5. The area of the lenticular sheets is 6x4 square feet with 30° FOV and 
15 LPI. The optical design of the lenticules is optimized for multi-view 3D 
display. 
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[0108] As shown in Figure 3, the lenticular sheet 3 1 0 for rear- 
projection displays includes a projector-side lenticular sheet 301, a viewer- 
side lenticular sheet 302, a diffuser 303, and substrates 304 between the 
lenticular sheets and diffuser. The two lenticular sheets 301-302 are mounted 
back-to-back on the substrates 304 with the optical diffuser 303 in the 
center. We use a flexible rear-projection fabric. 

[0109] The back-to-back lenticular sheets and the diffuser are 
composited into a single structure. To align the lenticules of the two sheets 
as precisely as possible, a transparent resin is used. The resin is UV- 
hardened and aligned. 

[0110] The projection-side lenticular sheet 301 acts as a light 
multiplexer, focusing the projected light as thin vertical stripes onto the 
diffuser, or a reflector 403 for front-projection, see Figure 4 below. 
Considering each lenticule to be an ideal pinhole camera, the stripes on the 
diffuser / reflector capture the view-dependent radiance of a three- 
dimensional lightfield, i.e., 2D position and azimuth angle. 

[0111] The viewer-side lenticular sheet acts as a light de-multiplexer 
and projects the view-dependent radiance back to a viewer 320. 

[0112] Figure 4 shows and alternative arrangement 400 for a front- 
projection display. The lenticular sheet 410 for the front-projection displays 
includes a projector-side lenticular sheet 401, a reflector 403, and a substrate 
404 between the lenticular sheets and reflector. The lenticular sheet 401 is 
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mounted using the substrate 404 and the optical reflector 403. We use a 
flexible front-projection fabric. 

[0113] Ideally, the arrangements of the cameras 1 1 0 and the 
arrangement of the projectors 171, with respect to the display unit, are 
substantially identical. An offset in the vertical direction between 
neighboring projectors may be necessary for mechanical mounting reasons, 
which can lead to a small loss of vertical resolution in the output image. 

[0114] As shown in Figure 5, a viewing zone 501 of a lenticular 
display is related to the field-of-view (FOV) 502 of each lenticule. The 
whole viewing area, i.e., 180 degrees, is partitioned into multiple viewing 
zones. In our case, the FOV is 30° , leading to six viewing zones. Each 
viewing zone corresponds to sixteen sub-pixels 510 on the diffuser 303. 

[0115] If the viewer 320 moves from one viewing zone to the next, a 
sudden image 'shift' 520 appears. The shift occurs because at the border of 
the viewing zone, we move from the \6 th sub-pixel of one lenticule to the 
first sub-pixel of a neighboring lenticule. Furthermore, a translation of the 
lenticular sheets with respect to each other leads to a change, i.e., apparent 
rotation, of the viewing zones. 

[0116] The viewing zone of our system is very large. We estimate the 
depth-of-field ranges from about two meters in front of the display to well 
beyond fifteen meters. As the viewer moves away, the binocular parallax 
decreases, while the motion parallax increases. We attribute this to the fact 
that the viewer sees multiple views simultaneously if the display is in the 
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distance. Consequently, even small movements with the head lead to big 
motion parallax. To increase the size of the viewing zones, lenticular sheets 
with wider FOV, and more LPI can be used. 

[0117] A limitation of our 3D display is that it provides only horizontal 
parallax. We believe that this is not a serious issue, as long as the viewer 
remains static. This limitation can be corrected by using integral lens sheets 
and two-dimensional camera and projector arrays. Head tracking can also be 
incorporated for display images with some vertical parallax on our lenticular 
screen. 

[01 18] Our system is not restricted to using lenticular sheets with the 
same LPI on the projection and viewer side. One possible design has twice 
the number of lenticules on the projector side. A mask on top of the diffuser 
can cover every other lenticule. The sheets are off-set such that a lenticule 
on the projector side provides the image for one lenticule on the viewing 
side. Other multi-projector displays with integral sheets or curved-mirror 
retro-reflection are possible as well. 

[01 19] We can also add vertically aligned projectors with diffusing 
filters of different strengths, e.g., dark, medium, and bright. Then, we can 
change the output brightness for each view by mixing pixels from different 
projectors. 

[0120] Our 3D TV system can also be used for point-to-point 
transmission, such as in video conferencing. 
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[0121] We also adapt our system to multi-view display units with a 
deformable display media, such as organic LEDs. If we know the orientation 
and relative position of each display unit, then we can render new virtual 
views by dynamically routing image information from the decoder modules 
to the consumers. 

[0122] Among other applications, this allows the design of "invisibility 
cloaks" by displaying view-dependent images on an object using a 
deformable display media, e.g., miniature multi-projectors pointed at front- 
projection fabric draped around the object, or small organic LEDs and 
lenslets that are mounted directly on the object surface. This "invisibility 
cloak" shows view-dependent images that would be seen if the object were 
not present. For dynamically changing scenes one can put multiple miniature 
cameras around or on the object to acquire the view-dependent images that 
are then displayed on the "invisibility cloak." 

[0123] Effect of the Invention 

[0124] We provide a 3D TV system with a scalable architecture for 
distributed acquisition, transmission, and rendering of dynamic lightfields. A 
novel distributed rendering method allows us to interpolate new views using 
little computation and moderate bandwidth. 

[01 25] Although the invention has been described by way of examples 
of preferred embodiments, it is to be understood that various other 
adaptations and modifications may be made within the spirit and scope of 
the invention. Therefore, it is the object of the appended claims to cover all 
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such variations and modifications as come within the true spirit and scope of 
the invention. 
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